Using Character-Level Sequence-to-Sequence Model for Word Level Text Generation to Enhance Arabic Speech Recognition

نویسندگان

چکیده

Owing to the linguistic richness of Arabic language, which contains more than 6000 roots, building a reliable language model for speech recognition systems faces many challenges. This paper introduces free automatic system Modern Standard based on an end-to-end-based Deep Speech architecture developed by Mozilla. The proposed uses character-level sequence-to-sequence map character alignment produced recognizer onto corresponding words. outperformed recent studies single-speaker and multi-speaker using two different state-of-the-art datasets. first was Multi-Genre Broadcast (MGB2) corpus with 1200 h audio data from multiple speakers. achieved new milestone in MGB2 challenge word error rate (WER) 3.2, outperforming related work same reduction 17%. An additional experiment 7-hour Saudi Accent Single Speaker Corpus (SASSC) used build single male speaker-based network architecture. experiments WER 4.25 relative improvement 33.8%.

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Arabic Character Recognition using Approximate Stroke Sequence

Arabic character recognition of handwriting is addressed. A novel approach for the Arabic Character Recognition is presented based on statistical analysis of a typical Arabic text is presented. Results showed that the sub-word in Arabic language is the basic pictorial block rather than the word. The method of approximate stroke sequence is applied for the recognition of some Arabic characters i...

متن کامل

Character-Level Linguistic Features Extraction for Text-to-Speech System

High quality linguistic features is the key to the success of speech synthesis. Traditional linguistic feature extraction methods are usually relied on a word-level natural language processing (NLP) parser. Since, a good parser requires a lot of feature engineering to build, it is usually a genral-purpose one and often not specially designed for speech synthesis. To avoid these difficulties, we...

متن کامل

An online sequence-to-sequence model for noisy speech recognition

Generative models have long been the dominant approach for speech recognition. The success of these models however relies on the use of sophisticated recipes and complicated machinery that is not easily accessible to non-practitioners. Recent innovations in Deep Learning have given rise to an alternative – discriminative models called Sequence-to-Sequence models, that can almost match the accur...

متن کامل

Morphological Inflection Generation Using Character Sequence to Sequence Learning

Morphological inflection generation is the task of generating the inflected form of a given lemma corresponding to a particular linguistic transformation. We model the problem of inflection generation as a character sequence to sequence learning problem and present a variant of the neural encoder-decoder model for solving it. Our model is language independent and can be trained in both supervis...

متن کامل

Sequence to Sequence Learning for Optical Character Recognition

We propose an end-to-end recurrent encoder-decoder based sequence learning approach for printed text Optical Character Recognition (OCR). In contrast to present day existing state-of-art OCR solution [Graves et al. (2006)] which uses CTC output layer, our approach makes minimalistic assumptions on the structure and length of the sequence. We use a two step encoder-decoder approach – (a) A recur...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: IEEE Access

سال: 2023

ISSN: ['2169-3536']

DOI: https://doi.org/10.1109/access.2023.3302257